home *** CD-ROM | disk | FTP | other *** search
-
- >Perhaps we should have all three.
-
- >I understand that these are the only SGML-conformant combinations. Is
- >this too much of a mess?
-
- I think so. The processing should be broken into two parts: SGML parsing,
- and application processing. The significance of newlines is an application
- issue: the SGML parser never throws out newlines in data (it does throw
- out newlines between tags and in some other places that I don't fully
- understand).
-
- These are the choices for SGML parsing:
-
- CDATA all characters treated as data.
- Terminated by </A where A is any letter.
- RCDATA characters and entities only. &entity; recognized.
- Terminated by </A as above
- mixed content tags and #PCDATA.
- Tags, entity references, comments, etc. recognized.
- The pattern of tags and data is regulated
- by the element declaration.
- element content tags only. Pattern of tags is regulated.
- ANY like mixed content, but tags aren't regulated
-
- CDATA is simplest to process, but you can't do things like
-
- char* any_string;
- printf("<XMP>%s</XMP>", any_string);
-
- because any_string might contain </A, and you're screwed.
-
- RCDATA is capable of the above construct, but at a cost:
-
- char* any_string;
- char* rcdata = HTML_replace_specials(any_string);
- printf("<XMP>%s</XMP>", rcdata);
- free(rcdata);
-
- where HTML_replace_specials changes '<' to < (to prevent </A), '>' to >
- (to prevent ]]>, the marked-section close delimiter. Ugh!), and
- '&' to & (to prevent &xxx from being mistaken for an entity reference).
-
- But if you're going to go to that trouble, you might as well
- use mixed content. That's why I changed my mind about using RCDATA
- for XMP and LISTING elements.
-
- My current DTD only uses element content (for the HTML document element*),
- CDATA (for XMP and LISTING) and mixed content (for everything else).
-
- As to your suggestions...
-
- ><XMP> newlines significant no anchors CDATA
-
- This is already supported, except that most implementations don't
- quite parse CDATA correctly. The "newlines significant" isn't a
- parsing issue. It's an issue of how the application processes character
- data. Let's call this mode of application processing where the
- characters are written to the screen as-is, rather than
- typeset into paragraphs TYPEWRITER mode. We'll call the default TYPESET mode.
-
- ><PRE> newlines significant no anchors RCDATA
-
- The only implementation of the PRE tag that I know of looks more like:
-
- <PRE> newlines significant anchors PCDATA
-
- It's actually pretty clean: you use
- mixed content SGML parsing, and TYPEWRITER application processing.
- So I changed the name to TYPEWRITER, for no good reason, really.
-
- The newlines significant/no anchors/RCDATA is what I suggested for
- XMP and LISTING, so they could contain any string. But since current
- implementations don't process entities in these elements, it's
- not worth it.
-
- ><FIXED> newlines not significant anchors PCDATA
-
- This introduces a third application mode besides RAW and TYPESET:
- it's kinda like RAW, but you toss the newlines, and start a newline
- at every <P> tag. I don't like it.
-
- >Tony, can you make a similar patch for <fixed> as above for Midas?
-
- You could, but it doesn't fit neatly into the current architecture.
- Tony wrote one widget to do TYPESET processing (SGMLCompoundText)
- and one to do TYPEWRITER processing (SGMLPlainText). The FIXED
- widget calls for a new widget, or a modification of SGMLPlainText
- to ignore newlines in some cases. (You can't just use the SGMLCompoundText
- with a fixed-width font, because it compresses whitespace.)
-
- >I have put Dan's new spec (which contains <typewriter> -- what's going on,
- >Dan?!) in the web at
- >http://info.cern.ch/hypertext/WWW/MarkUp/Connolly/MarkUp.html with a link from
- >the current spec.
-
- Thanks.
-
- > The DTD was not in the tar file, so Dan's previous one is
- >linked in. This includes all Dan's test HTML.
-
- Ack! I think the DTD is pretty important. I'll get the new
- one there ASAP. I highly suggest that _all_ data providers grab the DTD
- and the sgmls parser and try validating samples of the data they're
- serving up. It's the quickest and surest way to check for compliance.
- I need to write a section for data providers in the spec.
-
- >I would like to include <HEADER> and <BODY> tags too.
-
- * I wrestled with this at great length to come up with a DTD
- lends _some_ structure to HTML wihthout clashing badly with
- existing data.
-
- The document element declaration is:
- <!ELEMENT HTML O O ((TITLE? & NEXTID? & ISINDEX?), BODY)>
-
- The O O means the HTML start and end tags can be omitted.
- They'll be inferred by the parser. Since there's no #PCDATA
- in the content model, it has element content, so that
- whitespace between tags is thrown out.
-
- The TITLE, NEXTID, and ISINDEX can come in any order, and
- they can are optional, but they can appear at most once,
- and they have to be before the BODY.
-
- I made the <BODY> tags minimizable so current
- HTML is legal. I couldn't seem to work in a HEADER
- element the same way.
-
- Dan
-
-
-